Noise Dressing of Financial Correlation Matrices 



00 
On 
On 



Laurent Laloux^, Pierre Cizeau^, Jean-Philippe BouchaucU'* and Marc Potters^ 
1 Science & Finance, 109-111 rue Victor Hugo, 92532 Levallois Cedex, FRANCE 
* Service de Physique de I'Etat Condense, Centre d' etudes de Saclay, 
Orme des Merisiers, 91191 Gif-sur-Yvette Cedex, FRANCE 
(February 1, 2008) 

We show that results from the theory of random matrices are potentially of great interest to 
understand the statistical structure of the empirical correlation matrices appearing in the study of 
price fluctuations. The central result of the present study is the remarkable agreement between the 
theoretical prediction (based on the assumption that the correlation matrix is random) and empirical 
data concerning the density of eigenvalues associated to the time series of the different stocks of the 
S&P500 (or other major markets). In particular the present study raises serious doubts on the blind 
use of empirical correlation matrices for risk management. 
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An important aspect of risk management is the esti- 
mation of the correlations between the price movements 
of different assets. The probability of large losses for a 
certain portfolio or option book is dominated by corre- 
lated moves of its different constituents - for example, a 
position which is simultaneously long in stocks and short 
in bonds will be risky because stocks and bonds move in 
opposite directions in crisis periods. The study of cor- 
relation (or covariance) matrices thus has a long history 
in finance, and is one of the cornerstone of Markowitz's 
theory of optimal portfolios B. However, a reliable em- 
pirical determination of a correlation matrix turns out 
to be difficult: if one considers N assets, the correla- 
tion matrix contains N(N — l)/2 entries, which must be 
determined from N time series of length T; if T is not 
very large compared to N, one should expect that the 
determination of the covariances is noisy, and therefore 
that the empirical correlation matrix is to a large extent 
random, i.e. the structure of the matrix is dominated by 
measurement noise. If this is the case, one should be 
very careful when using this correlation matrix in appli- 
cations. In particular, as we shall show below, the small- 
est eigenvalues of this matrix are the most sensitive to 
this 'noise' - on the other hand, it is precisely the eigen- 
vectors corresponding to these smallest eigenvalues which 
determine, in Markowitz theory, the least risky portfolios 
0. It is thus important to devise methods which allows 
one to distinguish 'signal' from 'noise', i.e. eigenvectors 
and eigenvalues of the correlation matrix containing real 
information (which one would like to include for risk con- 
trol), from those which are devoid of any useful informa- 
tion, and, as such, unstable in time. From this point of 
view, it is interesting to compare the properties of an em- 
pirical correlation matrix C to a 'null hypothesis' purely 
random matrix as one could obtain from a finite time se- 
ries of strictly uncorrelated assets. Deviations from the 
random matrix case might then suggest the presence of 
true information. The theory of Random Matrices has 
a long history in physics since the fifties Q, and many 
results are known S. As shown below, these results are 



also of genuine interest in a financial context (see also 

!)■ 

The empirical correlation matrix C is constructed from 
the time series of price changes^ Sxi(t) (where i labels the 
asset and t the time) through the equation: 
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c« = yX>i(t)fo*(t). 
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We can symbolically write Eq. (§]) as C= 1/T M M T , 
where M is a N x T rectangular matrix, and T denotes 
matrix transposition. The null hypothesis of uncorre- 
lated assets, which we consider now, translates itself in 
the assumption that the coefficients Mn = 5xi(t) are 
independent, identically distributed, random variables^. 
We will note pc(A) the density of eigenvalues of C, de- 
fined as: 

1 dn(X) 
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where n(X) is the number of eigenvalues of C less than 
A. Interestingly, if M is a T x N random matrix, pc(A) 
is exactly known in the limit N — > oo, T — > oo and 
Q = T/N > 1 fixed §, and reads: 



Q V (A m ax — A)(A — A m i n ) 
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and where a 2 is equal to the vari- 
ance of the elements of M ||, equal to 1 with our nor- 
malisation. In the limit Q — 1 the normalised eigen- 
value density of the matrix M is the well known Wigner 



*In the following we assume that the average value of the 
Sx's has been subtracted off, and that the Sx's are rescaled to 
have a constant unit volatility. 

T Note that even if the 'true' correlation matrix Ctrue is the 
identity matrix, its empirical determination from a finite time 
series will generate non trivial eigenvectors and eigenvalues. 
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semi-circle law, and the corresponding distribution of the 
square of these eigenvalues (that is, the eigenvalues of C) 
is then indeed given by (^|) for Q = 1. The most impor- 
tant features predicted by Eq. (^) are: 

• the fact that the lower 'edge' of the spectrum is 
strictly positive (except for Q = 1); there is there- 
fore no eigenvalues between and A m ; n . Near this 
edge, the density of eigenvalues exhibits a sharp 
maximum, except in the limit Q = 1 (A m i n = 0) 
where it diverges as ~ 1/ VA- 

• the density of eigenvalues also vanishes above a cer- 
tain upper edge A max . 

Note that the above results are only valid in the limit 
N — ► oo. For finite N, the singularities present at 
both edges are smoothed: the edges become somewhat 
blurred, with a small probability of finding eigenvalues 
above A ma x and below A m i n , which goes to zero when N 
becomes large. 

Now, we want to compare the empirical distribution of 
the eigenvalues of the correlation matrix of stocks corre- 
sponding to different markets with the theoretical predic- 
tion given by Eq. (||), based on the assumption that the 
correlation matrix is random. We have studied numeri- 
cally the density of eigenvalues of the correlation matrix 
of N = 406 assets of the S&P 500, based on daily vari- 
ations during the years 1991-96, for a total of T = 1309 
days (the corresponding value of Q is 3.22). 

An immediate observation is that the highest eigen- 
value Ai is 25 times larger than the predicted A max _ see 
Fig. 1, inset. (The corresponding eigenvector is, as ex- 
pected, the 'market' itself, i.e. it has roughly equal com- 
ponents on all the N stocks.) The simplest 'pure noise' 
hypothesis is therefore inconsistent with the value of Ai . 
A more reasonable idea is that the components of the 
correlation matrix which are orthogonal to the 'market' 
is pure noise. This amounts to subtracting the contribu- 
tion of A max from the nominal value a 2 = 1, leading to 
a 2 = 1 — A max /./V = 0.85. The corresponding fit of the 
empirical distribution is shown as a dotted line in Fig. 
1. Several eigenvalues are still above A max and might 
contain some information, thereby reducing the variance 
of the effectively random part of the correlation matrix. 
One can therefore treat a 2 as an adjustable parameter. 
The best fit is obtained for a 2 — 0.74, and corresponds 
to the dark line in Fig. 1, which accounts quite satis- 
factorily for 94% of the spectrum, while the 6% highest 
eigenvalues still exceed the theoretical upper edge by a 
substantial amount. Note that still a better fit could be 
obtained by allowing for a slightly smaller effective value 
of Q, which could account for the existence of volatility 
correlations ||. 

We have repeated the above analysis on different stock 
markets (e.g. Paris) and found very similar results. In a 
first approximation, the location of the theoretical edge, 



determined by fitting the part of the density which con- 
tains most of the eigenvalues, allows one to distinguish 
'information' from 'noise'. However, a more careful study 
should be undertaken, in particular to treat adequately 
the finite N effects. 




FIG. 1. Smoothed density of the eigenvalues of C, where 
the correlation matrix C is extracted from N = 406 assets of 
the S&P500 during the years 1991-1996. For comparison we 
have plotted the density Eq. (6) for Q = 3.22 and a = 0.85: 
this is the theoretical value obtained assuming that the ma- 
trix is purely random except for its highest eigenvalue (dotted 
line). A better fit can be obtained with a smaller value of 
a 2 — 0.74 (solid line), corresponding to 74% of the total vari- 
ance. Inset: same plot, but including the highest eigenvalue 
corresponding to the 'market', which is found to be 30 times 
greater than A m ax- 

The idea that the low lying eigenvalues are essentially 
random can also be tested by studying the statistical 
structure of the corresponding eigenvectors. The i th com- 
ponent of the eigenvector corresponding to the eigenvalue 
A Q will be denoted as v a ^. We can normalise it such that 
EiLi "ai — N. If there is no information contained in 
the eigenvector v a .i, one expects that for a fixed a, the 
distribution of u = v aj i (as i is varied) is a maximum 
entropy distribution, such that u 2 — 1. This leads to 
the so-called Porter-Thomas distribution in the theory 
of random matrices: 

1 u 2 
P(u) = -=exp-—. (4) 

As shown in Fig. 2, this distribution fits extremely well 
the empirical histogram of the eigenvector components, 
except for those corresponding to the highest eigenvalues, 
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which lie beyond the theoretical edge A max . We show in 
the inset the distribution of u's for the highest eigenvalue, 
which is markedly different from the 'no information' as- 
sumption, Eq. (|J). 

We have finally studied correlation matrices corre- 
sponding not to price variations but to the (time depen- 
dent) volatilities of the different stocks, determined from 
the study of intraday fluctuations. These matrices should 
contain some relevant information for option trading and 
hedging. The obtained results are again very similar to 
those shown in Fig. 1 and 2. 



lations between financial assets of various types, with in- 
teresting potential applications to risk management and 
portfolio optimisation, ft is clear from the present study 
that Markowitz's portfolio optimisation scheme based on 
a purely historical determination of the correlation ma- 
trix is not adequate, since its lowest eigenvalues (corre- 
sponding to the smallest risk portfolios) are dominated 
by noise. 
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FIG. 2. Distribution of the eigenvector components 
U — Vc.i, for five different eigenvectors well inside the inter- 
val [A m j n , Amax], and comparison with the 'no information' 
assumption, Eq. (^). Note that there are no adjustable pa- 
rameters. Inset: Plot of the same quantity for the highest 
eigenvalue, showing marked differences with the theoretical 
prediction (dashed line), which is indeed expected. 
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To summarise, we have shown that results from the 
theory of random matrices (well documented in the 
physics literature ||) is of great interest to understand 
the statistical structure of the empirical correlation ma- 
trices. The central result of the present study is the re- 
markable agreement between the theoretical prediction 
and empirical data concerning both the density of eigen- 
values and the structure of eigenvectors of the empirical 
correlation matrices corresponding to several major stock 
markets. Indeed, in the case of the S&P 500, 94% of 
the total number of eigenvalues fall in the region where 
the theoretical formula (H) applies. Hence, less than 6% 
of the eigenvectors which are responsible of 26% of the 
total volatility, appear to carry some information. This 
method might be very useful to extract the relevant corre- 
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